The iohelper currently calls saferead() to get data from the
underlying file. This has a problem with O_DIRECT when hitting
end-of-file. saferead() is asked to read 1MB, but the first
read() it does may return only a few KB, so it'll try another
read() to fill the remaining buffer. Unfortunately the buffer
pointer passed into this 2nd read() is likely not aligned
to the extent that O_DIRECT requires, so rather than seeing
'0' for end-of-file, we'll get -1 + EINVAL due to misaligned
buffer.
The way the iohelper is currently written, it already handles
getting short reads, so there is actually no need to use
saferead() at all. We can simply call read() directly. The
benefit of this is that we can now write() the data immediately
so when we go into the subsequent reads() we'll always have a
correctly aligned buffer.
Technically the file position ought to be aligned for O_DIRECT
too, but this does not appear to matter when at end-of-file.
Tested-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add the backing parse and a test case to verify parsing of VxHS
backing storage.
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
Signed-off-by: John Ferlan <jferlan@redhat.com>
Add a new virStorageNetProtocol for Veritas HyperScale (VxHS) disks
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
Signed-off-by: John Ferlan <jferlan@redhat.com>
The mlx4 (Mellanox) netdev driver implements the sysfs phys_port_id
file for both VFs and PFs, so you can find the VF netdev plugged into
the same physical port as any given PF netdev by comparing the
contents of phys_port_id of the respective netdevs. That's what
libvirt does when attempting to find the PF netdev for a given VF
netdev (or vice versa).
Most other netdev's drivers don't implement phys_port_id, so the file
is visible in sysfs directory listing, but attempts to read it result
in ENOTSUPP. In these cases, libvirt is unable to read phys_port_id of
either the PF or the VF, so it just returns the first entry in the
PF/VF's list of netdevs.
But we've found that the i40e driver is in between those two
situations - it implements phys_port_id for PF netdevs, but doesn't
implement it for VF netdevs. So libvirt would successfully read the
phys_port_id of the PF netdev, then try to find a VF netdev with
matching phys_port_id, but would fail because phys_port_id is NULL for
all VFs. This would result in a message like the following:
Could not find network device with phys_port_id '3cfdfe9edc39'
under PCI device at /sys/class/net/ens4f1/device/virtfn0
To solve this problem in a way that won't break functionality for
anyone else, this patch saves the first netdev name we find for the
device, and returns that if we fail to find a netdev with the desired
phys_port_id.
Instead of checking for all possible constants that every
kernel header with devlink support should have (and defining
HAVE_DECL_DEVLINK as 1 if any of them is present due to the
way AC_CHECK_DECLS works), only check for DEVLINK_CMD_ESWITCH_GET.
This is the name of the constant since kernel 4.11. Between 4.8
and 4.11, the now deprecated spelling DEVLINK_CMD_ESWITCH_MODE_GET
was used.
Assume DEVLINK_ESWITCH_MODE_SWITCHDEV is available, since it was
introduced along with the deprecated spelling.
Adding functionality to libvirt that will allow querying the interface
for the availability of switchdev Offloading NIC capabilities.
The switchdev mode was introduced in kernel 4.8, the iproute2-devlink
command to retrieve the switchdev NIC feature with command example:
devlink dev eswitch show pci/0000:03:00.0
This feature is needed for Openstack so we can do a scheduling decision
if the NIC is in Hardware Offload (switchdev) or regular SR-IOV (legacy) mode.
And select the appropriate hypervisors with the requested capability see [1].
[1] - https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/enable-sriov-nic-features.html
Reviewed-by: Laine Stump <laine@laine.org>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Some cleanup paths overwrite a usefull error message with a less useful
one and we then try to preserve the original message. The handlers added
in this patch will simplify the operations since they are designed right
for the purpose.
TPM 2 does not implement sysfs files for cancellation of commands.
We therefore use /dev/null for the cancel path passed to QEMU.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Tested-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Although not previously explicitly documented, the expectation for
the libvirt event loop is that an implementation is registered early
in application startup, before calling any libvirt APIs and then
run forever after. Replacing a previously registered event loop is
not safe & subject to races even if virConnectClose has been called
on open handles, due to delayed deregistration of callbacks during
conenction close.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We can now check for the error and not care about the return value as
it will be properly handled in virBufferContentAndReset() anyway.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The function is useful even without using the return value. And if
needed, the return value can be obtained by other calls as well. The
potential for clean-up can be seen in the following patch.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit 94c465d0 refactored the logging setup phase but introduced an
issue, where the daemon ignores verbose mode when there are no outputs
defined and the default must be used. The problem is that the default
output was determined too early, thus ignoring the potential '--verbose'
option taking effect. This patch postpones the creation of the default
output to the very last moment when nothing else can change. Since the
default output is only created during the init phase, it's safe to leave
the pointer as NULL for a while, but it will be set eventually, thus not
affecting runtime.
Patch also adjusts both the other daemons.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1442947
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This helper allows you to better structurize the code if some element
may or may not contains attributes and/or child elements.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The helper returns true if a string contains any of the given chars.
virStringHasControlChars can be reimplemented using that helper.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
It's equivalent of calling virXPathString("string(.)", ctxt) but it
doesn't have to use the XPath resolving and parsing.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The virXMLPropStringLimit is an equivalent of virXPathStringLimit
which should be preferred if you already have a XML dom node or
if you need to parse more than one property.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Move networkMacMgrFileName into src/util/virmacmap.c and rename to
virMacMapFileName. We're about to move some more MacMgr processing
files into virnetworkobj and it doesn't make sense to have this helper
in the driver or in virnetworkobj.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than assuming that what's passed to virObject{Ref|Unref}
would be a virObjectPtr as long as it's not NULL, let's do the
similar checks virObjectIsClass in order to prevent a possible
increment or decrement to some field at the obj->u.s.refs offset.
Signed-off-by: John Ferlan <jferlan@redhat.com>
The virObjectIsClass API has only ever checked object validity
based on if the @obj is not NULL and it was derived from some class.
While this has worked well in general, there is one additional
check that could be made prior to calling virClassIsDerivedFrom
which loops through the classes checking the magic number against
the klass expected magic number.
If by chance a non virObject is passed, rather than assuming the
void * @obj is a _virObject and thus offsetting to obj->klass,
obj->magic, and obj->parent, let's check that the void * @obj
has at least the "base part" of the magic number in the right
place and generate a more specific VIR_WARN message if not.
There are many consumers to virObjectIsClass, include the locking
primitives virObject{Lock|Unlock}, virObjectRWLock{Read|Write},
and virObjectRWUnlock. For those callers, the locking call will
not fail, but it also will not attempt a virMutex* call which
will "most likely" fail since the &obj->lock is used.
In order to avoid some possible future wrap on the 0xCAFExxxx
value, add a check during initialization that some new class
won't cause the wrap. Should be good for a few years at least!
It is still left up to the caller to handle the failed API calls
just as it would be if it passed a NULL opaque pointer anyobj.
If virObjectIsClass fails "internally" to virobject.c, create a
macro to generate the VIR_WARN describing what the problem is.
Also improve the checks and message a bit to indicate which was
the failure - whether the obj was NULL or just not the right class
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than overload virObjectUnlock as commit id '77f4593b' has
done, create a separate virObjectRWUnlock API that will force the
consumers to make the proper decision regarding unlocking the
RWLock's. Similar to the RWLockRead and RWLockWrite, use the
virObjectGetRWLockableObj helper. This restores the virObjectUnlock
code to using the virObjectGetLockableObj.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Introduce a helper to handle the error path more cleanly. The same
as virObjectGetLockableObj in order to essentially follow the original
logic of commit 'b545f65d' to ensure that the input argument at least
has some validity before using.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Now that virObjectRWLockWrite exists to handle the virObjectRWLockable
objects, let's restore virObjectLock to only handle virObjectLockable
class locks. There still exists the possibility that the input @anyobj
isn't a valid object and the resource isn't truly locked, but that
also exists before commit id '77f4593b'.
This also restores some logic that commit id '77f4593b' removed
with respect to a common code path that commit id '10c2bb2b' had
introduced as virObjectGetLockableObj. This code path merely does
the same checks as the original virObjectLock commit 'b545f65d',
but in callable/reusable helper to ensure the @obj at least has
some validity before using.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Instead of making virObjectLock be the entry point for two
different types of locks, let's create a virObjectRWLockWrite API
which will only handle the virObjectRWLockableClass objects.
Use the new virObjectRWLockWrite for the virdomainobjlist code
in order to handle the Add, Remove, Rename, and Load operations
that need to be very synchronous.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since the class it represents is based on virObjectRWLockableClass
and in order to make sure we differentiate just in case anyone somehow
believes they could use virObjectLockRead for a virObjectLockableClass,
let's rename the API to use the RW in the name. Besides the RW locks
refer to pthread_rwlock_{init|rdlock|wrlock|unlock|destroy} while the
other locks refer to pthread_mutex_{init|lock|unlock|destroy}.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This way later patches can add another structures with virResctrl
prefix without the meaning being even more confusing than it needs to
be.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
That means that returning negative values means error and non-negative
values differ in meaning, but are all successful.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
It doesn't access anything from conf/ and ti will be needed to use
from other util/ places. This split makes the separation clearer.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit 81fb440b further qualified an if statement by adding the
boolean saveVlan to the condition. Coverity pointed out that this
change in the logic eliminated the need to check saveVlan in an
argument to virAsprintf().
Commit 9a94af6d restructured virHostdevReadNetConfig() so that it
would manually set ret = 0 after successfully reading the device's
config, but Coverity pointed out that "ret = 0" was erroneously placed
outside of an "else" clause, meaning that the the value of ret set in
the "if" clause was unnecessarily and incorrectly overwritten.
This patch moves ret = 0 into the else clause, which should silence
Coverity.
When using a VF from an SRIOV-capable network card in a guest (either
in macvtap passthrough mode, or via VFIO PCI device assignment), The
associated PF netdev must be online in order for the VF to be usable
by the guest. The guest, however, is not able to change the state of
the PF. And libvirt *could* set the PF online as needed, but that
could lead to the host receiving unexpected IPv6 traffic (since the
default for an unconfigured interface is to participate in IPv6
autoconf). For this reason, before assigning a VF to a guest, libvirt
verifies that the related PF netdev is online - if it isn't, then we
log an error and don't allow the guest startup to continue.
Until now, this check was done during virNetDevSetNetConfig(). This
works nicely because the same function is called both for macvtap
passthrough and for VFIO device assignment. But in the case of VFIO,
the VF has already been unbound from its netdev driver by the time we
get to virNetDevSetNetConfig(), and in the case of dual port Mellanox
NICs that have their VFs setup in single port mode, the *only* way to
determine the proper PF netdev to query for online status is via the
"phys_port_id" file that is in the VF netdev's sysfs directory. *BUT*
if we've unbound the VF from the netdev driver, then it doesn't *have*
a netdev sysfs directory.
So, in order to check the correct PF netdev for online status, this
patch moved the check earlier in the setup, into
virNetDevSaveNetConfig(), which is called *before* unbinding the VF
from its netdev driver.
(Note that this implies that if you are using VFIO device assignment
for the VFs of a Mellanox NIC that has the VFs programmed in single
port mode, you must let the VFs be bound to their net driver and use
"managed='yes'" in the device definition. To be more specific, this is
only true if the VFs in single port mode are using port *2* of the PF
- if the VFs are using only port 1, then the correct PF netdev will be
arrived at by default/chance))
This resolves: https://bugzilla.redhat.com/267191
virHostdevRestoreNetConfig() calls virNetDevReadNetConfig() to try and
read the "original config" of a netdev, and if that fails, it tries
again with a different directory/netdev name. This achieves the
desired effect (we end up finding the config wherever it may be), but
for each failure, virNetDevReadNetConfig() places a nice error message
in the system logs. Experience has shown that false-positive error
logs like this lead to erroneous bug reports, and can often mislead
those searching for *real* bugs.
This patch changes virNetDevReadNetConfig() to explicitly check if the
file exists before calling virFileReadAll(); if it doesn't exist,
virNetDevReadNetConfig() returns a success, but leaves all the
variables holding the results as NULL. (This makes sense if you define
the purpose of the function as "read a netdev's config from its config
file *if that file exists*).
To take advantage of that change, the caller,
virHostdevRestoreNetConfig() is modified to fail immediately if
virNetDevReadNetConfig() returns an error, and otherwise to try the
different directory/netdev name if adminMAC & vlan & MAC are all NULL
after the preceding attempt.
Mellanox ConnectX-3 dual port SRIOV NICs present a bit of a challenge
when assigning one of their VFs to a guest using VFIO device
assignment.
These NICs have only a single PCI PF device, and that single PF has
two netdevs sharing the single PCI address - one for port 1 and one
for port 2. When a VF is created it can also have 2 netdevs, or it can
be setup in "single port" mode, where the VF has only a single netdev,
and that netdev is connected either to port 1 or to port 2.
When the VF is created in dual port mode, you get/set the MAC
address/vlan tag for the port 1 VF by sending a netlink message to the
PF's port1 netdev, and you get/set the MAC address/vlan tag for the
port 2 VF by sending a netlink message to the PF's port 2 netdev. (Of
course libvirt doesn't have any way to describe MAC/vlan info for 2
ports in a single hostdev interface, so that's a bit of a moot point)
When the VF is created in single port mode, you can *set* the MAC/vlan
info by sending a netlink message to *either* PF netdev - the driver
is smart enough to understand that there's only a single netdev, and
set the MAC/vlan for that netdev. When you want to *get* it, however,
the driver is more accurate - it will return 00:00:00:00:00:00 for the
MAC if you request it from the port 1 PF netdev when the VF was
configured to be single port on port 2, or if you request if from the
port 2 PF netdev when the VF was configured to be single port on port
1.
Based on this information, when *getting* the MAC/vlan info (to save
the original setting prior to assignment), we determine the correct PF
netdev by matching phys_port_id between VF and PF.
(IMPORTANT NOTE: this implies that to do PCI device assignment of the
VFs on dual port Mellanox cards using <interface type='hostdev'>
(i.e. if you want the MAC address/vlan tag to be set), not only must
the VFs be configured in single port mode, but also the VFs *must* be
bound to the host VF net driver, and libvirt must use managed='yes')
By the time libvirt is ready to set the new MAC/vlan tag, the VF has
already been unbound from the host net driver and bound to
vfio-pci. This isn't problematic though because, as stated earlier,
when a VF is created in single port mode, commands to configure it can
be sent to either the port 1 PF netdev or the port 2 PF netdev.
When it is time to restore the original MAC/vlan tag, again the VF
will *not* be bound to a host net driver, so it won't be possible to
learn from sysfs whether to use the port 1 or port 2 PF netdev for the
netlink commands. And again, it doesn't matter which netdev you
use. However, we must keep in mind that we saved the original settings
to a file called "${PF}_${VFNUM}". To solve this problem, we just
check for the existence of ${PF1}_${VFNUM} and ${PF2}_${VFNUM}, and
use whichever one we find (since we know that only one can be there)
This patch updates functions in netdev.c to pay attention to
phys_port_id. It uses the new function virNetDevGetPhysPortID() to
learn the phys_port_id of a VF or PF, then sends that info to
virPCIGetNetName(), which has newly been modified to take an optional
phys_port_id.
A single PCI device may have multiple netdevs associated with it. Each
of those netdevs will have a different phys_port_id entry in
sysfs. This patch modifies virPCIGetNetName() to allow selecting one
of the potential many netdevs in two different ways:
1) by setting the "idx" argument, the caller can select the 1st (0),
2nd (1), etc. netdev from the PCI device's net subdirectory.
2) If the physPortID arg is set (to a null-terminated string) then
virPCIGetNetName() returns the netdev that has that phys_port_id in
the sysfs file of the same name in the netdev's directory.
On Linux each network device *can* (but not necessarily *does*) have
an attribute called phys_port_id which can be read from the file of
that name in the netdev's sysfs directory. The examples I've seen have
been a many-digit hexadecimal number (as an ASCII string).
This value can be useful when a single PCI device is associated with
multiple netdevs (e.g a dual port Mellanox SR-IOV NIC - this card has
a single PCI Physical Function (PF), and that PF has two netdevs
associated with it (the "net" subdirectory of the PF in sysfs has two
links rather than the usual single link to a netdev directory). Each
of the PF netdevs has a different phys_port_id. The Virtual Functions
(VF) are similar - the PF (a PCI device) has "n" VFs (also each of
these is a PCI device), each VF has two netdevs, and each of the VF
netdevs points back to the VF PCI device (with the "device" entry in
its sysfs directory) as well as having a phys_port_id matching the PF
netdev it is associated with.
virNetDevGetPhysPortID() simply attempts to read the phys_port_id for
the given netdev and return it to the caller. If this particular
netdev driver doesn't support phys_port_id, it returns NULL (*not* a
NULL-terminated string, but a NULL pointer) but still counts it as a
success.
We are given a string in @machinename, we never allocate it, just
merely use it for reading. We should not free it otherwise it
leads to double free:
==32191== Thread 17:
==32191== Invalid free() / delete / delete[] / realloc()
==32191== at 0x4C2D1A0: free (vg_replace_malloc.c:530)
==32191== by 0x54BBB84: virFree (viralloc.c:582)
==32191== by 0x2BC04499: qemuProcessStop (qemu_process.c:6313)
==32191== by 0x2BC500FF: processMonitorEOFEvent (qemu_driver.c:4724)
==32191== by 0x2BC502FC: qemuProcessEventHandler (qemu_driver.c:4769)
==32191== by 0x5550640: virThreadPoolWorker (virthreadpool.c:167)
==32191== by 0x554FBCF: virThreadHelper (virthread.c:206)
==32191== by 0x8F913D3: start_thread (in /lib64/libpthread-2.23.so)
==32191== by 0x928DE3C: clone (in /lib64/libc-2.23.so)
==32191== Address 0x31893d70 is 0 bytes inside a block of size 1,100 free'd
==32191== at 0x4C2D1A0: free (vg_replace_malloc.c:530)
==32191== by 0x54BBB84: virFree (viralloc.c:582)
==32191== by 0x54C1936: virCgroupValidateMachineGroup (vircgroup.c:343)
==32191== by 0x54C4B29: virCgroupNewDetectMachine (vircgroup.c:1550)
==32191== by 0x2BBDDA29: qemuConnectCgroup (qemu_cgroup.c:972)
==32191== by 0x2BC05DA7: qemuProcessReconnect (qemu_process.c:6822)
==32191== by 0x554FBCF: virThreadHelper (virthread.c:206)
==32191== by 0x8F913D3: start_thread (in /lib64/libpthread-2.23.so)
==32191== by 0x928DE3C: clone (in /lib64/libc-2.23.so)
==32191== Block was alloc'd at
==32191== at 0x4C2BE80: malloc (vg_replace_malloc.c:298)
==32191== by 0x4C2E35F: realloc (vg_replace_malloc.c:785)
==32191== by 0x54BB492: virReallocN (viralloc.c:245)
==32191== by 0x54BEDF2: virBufferGrow (virbuffer.c:150)
==32191== by 0x54BF3B9: virBufferVasprintf (virbuffer.c:408)
==32191== by 0x54BF324: virBufferAsprintf (virbuffer.c:381)
==32191== by 0x55BB271: virDomainGenerateMachineName (domain_conf.c:27078)
==32191== by 0x2BBD5B8F: qemuDomainGetMachineName (qemu_domain.c:9595)
==32191== by 0x2BBDD9B4: qemuConnectCgroup (qemu_cgroup.c:966)
==32191== by 0x2BC05DA7: qemuProcessReconnect (qemu_process.c:6822)
==32191== by 0x554FBCF: virThreadHelper (virthread.c:206)
==32191== by 0x8F913D3: start_thread (in /lib64/libpthread-2.23.so)
Moreover, make the @machinename 'const char *' to mark it
explicitly that we are not changing the passed string.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The new virFileCache will nicely handle the caching logic for any data
that we would like to cache. For each type of data we will just need
to implement few handlers that will take care of creating, validating,
loading and saving the cached data.
The cached data must be an instance of virObject.
Currently we cache QEMU capabilities which will start using
virFileCache.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
It is more related to a domain as we might use it even when there is
no systemd and it does not use any dbus/systemd functions. In order
not to use code from conf/ in util/ pass machineName in cgroups code
as a parameter. That also fixes a leak of machineName in the lxc
driver and cleans up and de-duplicates some code.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This reverts commit 328bd24443.
As it turns out, this is not portable and very Linux & glibc
specific. Worse, this may lead to not starving writers on Linux
but everywhere else. Revert this and if the starvation occurs
resolve it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Up until now we only had virObjectLockable which uses mutexes for
mutually excluding each other in critical section. Well, this is
not enough. Future work will require RW locks so we might as well
have virObjectRWLockable which is introduced here.
Moreover, polymorphism is introduced to our code for the first
time. Yay! More specifically, virObjectLock will grab a write
lock, virObjectLockRead will grab a read lock then (what a
surprise right?). This has great advantage that an object can be
made derived from virObjectRWLockable in a single line and still
continue functioning properly (mutexes can be viewed as grabbing
write locks only). Then just those critical sections that can
grab a read lock need fixing. Therefore the resulting change is
going to be way smaller.
In order to avoid writer starvation, the object initializes RW
lock that prefers writers.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We already have virRWLockInit. But this uses pthread defaults
which prefer reader to initialize the RW lock. This may lead to
writer starvation. Therefore we need to have the counterpart that
prefers writers. Now, according to the
pthread_rwlockattr_setkind_np() man page setting
PTHREAD_RWLOCK_PREFER_WRITER_NP attribute is no-op. Therefore we
need to use PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP
attribute. So much for good enum value names.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Currently, @port is type of string. Well, that's overkill and
waste of memory. Port is always an integer. Use it as such.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
A new function virNetDevOpenvswitchUpdateVlan has been created to instruct
OVS of the changes. qemuDomainChangeNet has been modified to handle the
update of the VLAN configuration for a running guest and rely on
virNetDevOpenvswitchUpdateVlan to do the actual update if needed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
While searching for an element using a function it may be
desirable to know the element key for future operation.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The purpose of this function is to tell if the current position
in given FD is in data section or a hole and how much bytes there
is remaining until the end of the section. This is achieved by
couple of lseeks(). The most important part is that we reposition
the FD back, so that the position is unchanged from the caller
POV. And until now the final lseek() back to the original
position was done with no check for errors. And I was convinced
that that's okay since nothing can go wrong. However, review
feedback from a related series persuaded me, that it's better to
be safe than sorry. Therefore, lets check if the final lseek()
succeeded and if it doesn't report an error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Fill them in right away rather than having to figure out at runtime
whether they are necessary or not.
virStorageSourceNetworkDefaultPort does not need to be exported any
more.
This reverts commit e4b980c853.
When a binary links against a .a archive (as opposed to a shared library),
any symbols which are marked as 'weak' get silently dropped. As a result
when the binary later runs, those 'weak' functions have an address of
0x0 and thus crash when run.
This happened with virtlogd and virtlockd because they don't link to
libvirt.so, but instead just libvirt_util.a and libvirt_rpc.a. The
virRandomBits symbols was weak and so left out of the virtlogd &
virtlockd binaries, despite being required by virHashTable functions.
Various other binaries like libvirt_lxc, libvirt_iohelper, etc also
link directly to .a files instead of libvirt.so, so are potentially
at risk of dropping symbols leading to a later runtime crash.
This is normal linker behaviour because a weak symbol is not treated
as undefined, so nothing forces it to be pulled in from the .a You
have to force the linker to pull in weak symbols using -u$SYMNAME
which is not a practical approach.
This risk is silent bad linkage that affects runtime behaviour is
not acceptable for a fix that was merely trying to fix the test
suite. So stop using __weak__ again.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the scan of the /proc/mounts file used to find cgroup mount
points doesn't take into account that mount points may hidden by other
mount points. For, example in certain Kubernetes environments the
/proc/mounts contains the following lines:
cgroup /sys/fs/cgroup/net_prio,net_cls cgroup ...
tmpfs /sys/fs/cgroup tmpfs ...
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup ...
In this particular environment the first mount point is hidden by the
second one. The correct mount point is the third one, but libvirt will
never process it because it only checks the first mount point for each
controller (net_cls in this case). So libvirt will try to use the first
mount point, which doesn't actually exist, and the complete detection
process will fail.
To avoid that issue this patch changes the virCgroupDetectMountsFromFile
function so that when there are duplicates it takes the information from
the last line in /proc/mounts. This requires removing the previous
explicit condition to skip duplicates, and adding code to free the
memory used by the processing of duplicated lines.
Related-To: https://bugzilla.redhat.com/1468214
Related-To: https://github.com/kubevirt/libvirt/issues/4
Signed-off-by: Juan Hernandez <jhernand@redhat.com>
Currently all mockable functions are annotated with the 'noinline'
attribute. This is insufficient to guarantee that a function can
be reliably mocked with an LD_PRELOAD. The C language spec allows
the compiler to assume there is only a single implementation of
each function. It can thus do things like propagating constant
return values into the caller at compile time, or creating
multiple specialized copies of the function body each optimized
for a different caller. To prevent these optimizations we must
also set the 'noclone' and 'weak' attributes.
This fixes the test suite when libvirt.so is built with CLang
with optimization enabled.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The HOST_NAME_MAX, INET_ADDRSTRLEN and VIR_LOOPBACK_IPV4_ADDR
constants are only used by a handful of files, so are better
kept in virsocketaddr.h or the source file that uses them.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If a value of the first level object contains more objects needing
deflattening which would be wrapped in an actual object the function
would not recurse into them.
By this simple addition we can fully deflatten the objects.
As it turns out sometimes users pass in an arbitrarily nested structure
e.g. for the qemu backing chains JSON pseudo protocol. This new
implementation deflattens now a single object fully even with nested
keys.
Additionally it's not necessary now to stick with the "file." prefix for
the properties.
Currently the function would deflatten the object by dropping the 'file'
prefix from the attributes. This does not really scale well or adhere to
the documentation.
Until we refactor the worker to properly deflatten everything we at
least simulate it by adding the "file" wrapper object back.
virCommand is a version of virExec that doesn't fork, however it is
just calling execve and doesn't honors setting uid/gid and pwd.
This commit extrac those pieces from virExec() to a virExecCommon()
function that is called from both virExec() and virCommandExec().
Problem with our error reporting is that the error object is a
thread local variable. That means if there's an error reported
within the I/O thread it gets logged and everything, but later
when the event loop aborts the stream it doesn't see the original
error. So we are left with some generic error. We can do better
if we copy the error message between the threads.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When the I/O thread quits (e.g. due to an I/O error, lseek()
error, whatever), any subsequent virFDStream API should return
error too. Moreover, when invoking stream event callback, we must
set the VIR_STREAM_EVENT_ERROR flag so that the callback knows
something bad happened.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1371892
As it turns out the volume create, build, and refresh path was not peeking
at the meta data, so immediately after a create operation the value displayed
for capacity was still incorrect. However, if a pool refresh was done the
correct value was fetched as a result of a meta data peek.
The reason is it seems historically if the file type is RAW then peeking
at the file just took the physical value for the capacity. However, since
we know if it's an encrypted file, then peeking at the meta data will be
required in order to get a true capacity value.
So check for encryption in the source and if present, use the meta data
in order to fill in the capacity value and set the payload_offset.
https://bugzilla.redhat.com/show_bug.cgi?id=1461270
When fetching stats for a vhost-user type of interface, we run
couple of ovs-vsctl commands and parse their output. However, not
all stats exist at all times, for instance "rx_dropped" or
"tx_errors" can be missing. Thing is, we ask for a bulk of
statistics and if one of them is missing an error is reported
instead of returning the rest. Since we ignore errors, we fail to
set statistics. Fix this by asking for each piece alone.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 5c54d29aae forgot to do that when moving the only function
using it and it broke the build on some platforms.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This commit fixes a locale problem with locales that use comma as a mantissa
separator. Example: 12.34 en_US = 12,34 pt_BR. Since strtod() is a non-safe
function, virStrToDouble() will have problems to parse double numbers from
kernel settings and other double numbers from static files (XMLs, JSONs, etc).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1457634
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1457481
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
The function virDoubleToStr() is defined in virutil.* and virStrToDouble() is
defined in virstring.*. Joining both functions into the same file makes more
sense.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Starting from qemu 2.9, more granular options are supported. Add parser
for the relevant bits.
With this patch libvirt is able to parse the host and target IQN of from
the JSON pseudo-protocol specification.
This corresponds to BlockdevOptionsIscsi in qemu qapi.
'SocketAddress' structure was changed to contain 'inet' instead of
'tcp' since qemu commit c5f1ae3ae7b. Existing entries have a backward
compatibility layer.
Libvirt will parse 'inet' and 'tcp' as equivalents.
The same json strucutre is used for NBD and sheepdog volumes for
specifying of the host. Rename the function and fix up error messages to
be more universal.
Change the settings from qemuDomainUpdateDeviceLive() as otherwise the
call would succeed even though nothing has changed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1414627
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
With glibc >= 2.25.90 writev() is only available if you explicitly
include sys/uio.h. This matches the documented requirements, but
older glibc and other *NIX pulled in writev indirectly so the bug
wasn't noticed previously.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Use ATTRIBUTE_FALLTHROUGH, introduced by commit
5d84f5961b, instead of comments to
indicate that the fall through is an intentional behavior.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1459091
We try to get the last element of the passed path by calling
strrch(path, '/'). However, the pointer that strrchr() returns
points at the slash, We want string that starts right after that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There's a problem with current streams after I switched them from
iohelper to thread implementation. Previously, iohelper made sure
not to exceed specified @length resulting in the pipe EOF
appearing at the exact right moment (the pipe was used to tunnel
the data from the iohelper to the daemon). Anyway, when switching
to thread I had to write the I/O code from scratch. Whilst doing
that I took an inspiration from the iohelper code, but since the
usage of pipe switched to slightly different meaning, there was
no 1:1 relationship between the codes.
Moreover, after introducing VIR_FDSTREAM_MSG_TYPE_HOLE, the
condition that should made sure we won't exceed @length was
completely wrong.
The fix is to:
a) account for holes for @length
b) cap not just data sections but holes too (if @length would be
exceeded)
For this purpose, the condition needs to be brought closer to the
code that handles holes and data sections.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The host address or the socket path have already been checked at the
begining of the function virStorageSourceParseNBDColonString(). So,
when the parameter is not a unix socket, there is no reason to check
the address again because if it does not exists, the logic will fail
in the first IF conditional.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
VIR_STRDUP returns -1 if the string copy was not successful. So, the
current comparison/logic is throwing an error when VIR_STRDUP() returns
1. Only when source is NULL, it is considering as a success which is
not right.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Callers expect the return value to be the total number of vcpus in the
host (including offline vcpus). The refactor in c67e04e25f
broke this assumption by using virHostCPUGetOnlineBitmap which only
creates a bitmap long enough to hold the last online vcpu.
Report the full number of host vcpus by returning value from
virHostCPUGetCount().
Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
We will need some convenient helper functions for managing sysfs-entries
for fibre channel-backed devices. Let's implement them and make them
available in the private API.
Signed-off-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
The @ipv6_host allocated in virAsprintf may be lost when virAsprintf
addrstr failed.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Many vendor id's and product id's have leading zeros. We should show
them in the logs.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Reviewed-by: Laine Stump <laine@laine.org>
@reply is a DBusMessage object returned by virDBusCallMethod in
get machine object call path, dereference it before calling
virDBusCallMethod again to get machine name.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Some older systems (such as RHEL6) lack SEEK_HOLE and SEEK_DATA
which virFileInData relies on. Provide a stub for these systems.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently, we don't assign any meaning to that. Our current view
on virStream is that it's merely a pipe. And pipes don't support
seeking.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit e3ba4025 introduced srv->handles and VIR_RESIZE_N to allocate
@handles as necessary, but did not free the handles during when calling
virNetlinkEventServiceStop.
Commit 15a71e60 introduced the virNetlinkEventServiceStopAll function, and
the code in virNetlinkEventServiceStop is copied to this function, so just
call virNetlinkEventServiceStop instead.
Namely, this patch is about virMediatedDeviceGetIOMMUGroup{Dev,Num}
functions. There's no compelling reason why these functions should take
an object, on the contrary, having to create an object every time one
needs to query the IOMMU group number, discarding the object afterwards,
seems odd.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Commit 8e09663 "pci: recognize/report GEN4 (PCIe 4.0) card 16GT/s Link
speed" introduced another speed into enum, but mistakenly also altered
field width, so one bit of link width was included there.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Basically, what is needed here is to introduce new message type
for the messages passed between the event loop callbacks and the
worker thread that does all the I/O. The idea is that instead of
a queue of read buffers we will have a queue where "hole of size
X" messages appear. That way the event loop callbacks can just
check the head of the queue and see if the worker thread is in
data or a hole section and how long the section is.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This function takes a FD and determines whether the current
position is in data section or in a hole. In addition to that,
it also determines how much bytes are there remaining till the
current section ends.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
One big downside of using the pipe to transfer the data is that
we can really transfer just bare data. No metadata can be carried
through unless some formatted messages are introduced. That would
be quite painful to achieve so let's use a message queue. It's
fairly easy to exchange info between threads now that iohelper is
no longer used.
The reason why we cannot use the FD for plain files directly is
that despite us setting noblock flag on the FD, any
read()/write() blocks regardless (which is a show stopper since
those parts of the code are run from the event loop) and poll()
reports such FD as always readable/writable - even though the
subsequent operation might block.
The pipe is still not gone though. It is used to signal the event
loop that an event occurred (e.g. data is available for reading
in the queue, or vice versa).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The QEMU default is GICv2, and some of the code in libvirt
relies on the exact value. Stop pretending that's not the
case and use GICv2 explicitly where needed.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
The metadata libvirt cares about is identical for version 3
as for previous versions, so we merely need list the new
version number.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Simply tries to match the provided regex on a string and returns
the result. Useful if caller don't care about the matched substring
and want to just test if some pattern patches a string.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
GCC complains that inlining virStringTrimOptionalNewline is not
likely on some platforms:
cc1: warnings being treated as errors
../../src/util/virfile.c: In function 'virFileReadValueBitmap':
../../src/util/virstring.h:292: error: inlining failed in call to 'virStringTrimOptionalNewline': call is unlikely and code size would grow [-Winline]
../../src/util/virfile.c:3987: error: called from here [-Winline]
Inlining this function is not going to be a measurable performance
benefit either, since the time required to execute it is going to
be dominated by running of strlen() over the string, not by the
function call overhead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It is no longer needed thanks to the great virfilewrapper.c. And this
way we don't have to add a new set of functions for each prefixed
path.
While on that, add two functions that weren't there before, string and
scaled integer reading ones. Also increase the length of the string
being read by one to accompany for the optional newline at the
end (i.e. change INT_STRLEN_BOUND to INT_BUFSIZE_BOUND).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
So, because mingw is somehow OK with dereferencing a pointer within a
VIR_DEBUG macro, compared to outside of it to which it complained with a
"potential NULL pointer dereference" error (still a false positive), we
can make the code a tiny bit cleaner.
Sighed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
After bdcf6e481 there is a crasher in libvirt. The commit assumes
that priv->perf is always set. That is not true. For inactive
domains, the priv->perf is not allocated as it is set in
qemuProcessLaunch(). Now, usually we differentiate between
accesses to inactive and active definition and it works just
fine. Except for 'domstats'. There priv->perf is accessed without
prior check for domain inactivity. While we could check for that,
more robust solution is to make virPerfEventIsEnabled() accept
NULL.
How to reproduce:
1) ensure you have at least one inactive domain
2) virsh domstats
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
This patch fixes the following MinGW error (although actually being a
false positive):
../../src/util/virmdev.c: In function 'virMediatedDeviceListMarkDevices':
../../src/util/virmdev.c:453:21: error: potential null pointer
dereference [-Werror=null-dereference]
const char *mdev_path = mdev->path;
^~~~~~~~~
Signed-off-by: Erik Skultety <eskultet@redhat.com>
The problem resides in virHostdevUpdateActiveMediatedDevices which gets
called during qemuProcessReconnect. The issue here is that
virMediatedDeviceListAdd takes a pointer to the item to be added to the
list to which VIR_APPEND_ELEMENT is used, which also clears the pointer.
However, in this case only the local copy of the pointer got cleared,
leaving the original pointing to valid memory. To sum it up, during
cleanup phase, the original pointer is freed and the daemon crashes
basically any time it would access it.
Backtrace:
0x00007ffff3ccdeba in __strcmp_sse2_unaligned
0x00007ffff72a444a in virMediatedDeviceListFindIndex
0x00007ffff7241446 in virHostdevReAttachMediatedDevices
0x00007fffc60215d9 in qemuHostdevReAttachMediatedDevices
0x00007fffc60216dc in qemuHostdevReAttachDomainDevices
0x00007fffc6046e6f in qemuProcessStop
0x00007fffc6091596 in processMonitorEOFEvent
0x00007fffc6091793 in qemuProcessEventHandler
0x00007ffff7294bf5 in virThreadPoolWorker
0x00007ffff7294184 in virThreadHelper
0x00007ffff3fdc3c4 in start_thread () from /lib64/libpthread.so.0
0x00007ffff3d269cf in clone () from /lib64/libc.so.6
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1446455
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Use a local variable to hold data, rather than accessing the pointer
after calling virMediatedDeviceListAdd (therefore VIR_APPEND_ELEMENT).
Although not causing an issue at the moment, this change is a necessary
prerequisite for tweaking virMediatedDeviceListAdd in a separate patch,
which will take a reference for the source pointer (instead of pointer
value) and will clear it along the way.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
If we are encoding a block of data that is 16 bytes in length,
we cannot leave it as 16 bytes, we must pad it out to the next
block boundary, 32 bytes. Without this padding, the decoder will
incorrectly treat the last byte of plain text as the padding
length, as it can't distinguish padded from non-padded data.
The problem exhibited itself when using a 16 byte passphrase
for a LUKS volume
$ virsh secret-set-value 55806c7d-8e93-456f-829b-607d8c198367 \
$(echo -n 1234567812345678 | base64)
Secret value set
$ virsh start demo
error: Failed to start domain demo
error: internal error: process exited while connecting to monitor: >>>>>>>>>>Len 16
2017-05-02T10:35:40.016390Z qemu-system-x86_64: -object \
secret,id=virtio-disk1-luks-secret0,data=SEtNi5vDUeyseMKHwc1c1Q==,\
keyid=masterKey0,iv=zm7apUB1A6dPcH53VW960Q==,format=base64: \
Incorrect number of padding bytes (56) found on decrypted data
Notice how the padding '56' corresponds to the ordinal value of
the character '8'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
... with VIR_NET_GENERATED_MACV???_PREFIX, which is defined in
util/virnetdevmacvlan.h.
Since VIR_NET_GENERATED_PREFIX is used for plain tap devices, it is
renamed to VIR_NET_GENERATED_TAP_PREFIX and moved to virnetdev.h
MACVTAP_NAME_PREFIX and MACVLAN_NAME_PREFIX could be useful to other
files if they were defined in virnetdevmacvlan.h instead of
virnetdevmacvlan.c, so do that (while slightly renaming them and also
adding yet another #define that chooses between macvlan/macvtap based
on flags).
This is a prerequisite to fix: https://bugzilla.redhat.com/1335798
After 1eb6647979 nobody calls the iohelper with 6 arguments.
Everybody uses the other mode. Well, the only user of iohelper
after the previous commit is virFileWrapperFd really.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Currently we use iohelper for virFDStream implementation. This is
because UNIX I/O can lie sometimes: even though a FD for a
file/block device is set as unblocking, actual read()/write() can
block. To avoid this, a pipe is created and one end is kept for
read/write while the other is handed over to iohelper to
write/read the data for us. Thus it's iohelper which gets blocked
and not our event loop.
This approach has two problems:
1) we are spawning a new process.
2) any exchange of information between daemon and iohelper can be
done only through the pipe.
Therefore, iohelper is replaced with an implementation in thread
which is created just for the stream lifetime. The data are still
transferred through pipe (for now), but both problems described
above are solved.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
While this is no functional change, it makes the code look a bit
nicer. Moreover, it prepares ground for future work.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
There is really no reason why we should have to have 'struct'
everywhere.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>