libvirt wrongly assumes that VF netdev has to have the
netdev assigned to PF. There is no such requirement in SRIOV standard.
This patch change the virNetDevSwitchdevFeature() function to deal
with SRIOV devices which does not have netdev on PF. Also corrects
one comment about PF netdev assumption.
One example of such devices is ThunderX VNIC.
By applying this change, VF device is used for virNetlinkCommand() as
it is the only netdev assigned to VNIC.
Signed-off-by: Radoslaw Biernacki <radoslaw.biernacki@linaro.org>
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 7282f455a got rid of the VIR_WARNINGS_NO_CAST_ALIGN macro
when refactoring the code and broke the build with clang.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
I had intended to make these changes to commit d40b820c before
pushing, but forgot about it during the day between the initial review
and ACK.
Neither change is significant - just returning immediately when
virNetDevGetName() fails (instead of logging a debug message first)
and eliminating a comment that adds to confusion rather than
eliminating it. Still, the changes should be made to be more
consistent with nearly identical code just a few lines up (added in
commit 7282f455)
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When checking the setting of accept_ra, we have assumed that all
routes have a single nexthop, so the interface of the route would be
in the RTA_OIF attribute of the netlink RTM_NEWROUTE message. But
multipath routes don't have an RTA_OIF; instead, they have an
RTA_MULTIPATH attribute, which is an array of rtnexthop, with each
rtnexthop having an interface. This patch adds a loop to look at the
setting of accept_ra of the interface for every rtnexthop in the
array.
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This is about the same number of code lines, but is simpler, and more
consistent with what will be added to check another attribute in a
coming patch.
As a side effect, it
Resolves: https://bugzilla.redhat.com/1583131
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This same operation needs to be done in multiple places, so move the
inline code into a separate function.
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This is problematic if a callback function wants to send the nlmsghdr
to a library function that has no "const" in its prototype
(e.g. nlmsg_find_attr())
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Essentially, bring back the old behaviour as of commit eba36a38 which
was later changed by commit ae06048bf5. Even though all the stderr
messages will eventually end up in the journal, we're not making use of
the fields journald provides.
https://bugzilla.redhat.com/show_bug.cgi?id=1592644
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
In 600462834f we've tried to remove Author(s): lines
from comments at the beginning of our source files. Well, in some
files while we removed the "Author" line we did not remove the
actual list of authors.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This test checks if security label remembering works correctly.
It uses qemuSecurity* APIs to do that. And some mocking (even
though it's not real mocking as we are used to from other tests
like virpcitest). So far, only DAC driver is tested.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The arguments to the N_() macro must only ever be a literal string. It
is not possible to use macro arguments, or use macro string
concatenation in this context. The N_() macro is a no-op whose only
purpose is to act as a marker for xgettext when it extracts translatable
strings from the source code. Anything other than a literal string will
be silently ignored by xgettext.
Unfortunately this means that the clever MSG, MSG2 & MSG_EXISTS macros
used for building up error message strings have prevented any of the
error messages getting marked for translation. We must sadly, revert to
a more explicit listing of strings for now.
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The QEMU command line arguments are very long and currently all written
on a single line to /var/log/libvirt/qemu/$GUEST.log. This introduces
logic to add line breaks after every env variable and "-" optional
argument, and every positional argument. This will create a clearer log
file, which will in turn present better in bug reports when people cut +
paste from the log into a bug comment.
An example log file entry now looks like this:
2018-12-14 12:57:03.677+0000: starting up libvirt version: 5.0.0, qemu version: 3.0.0qemu-3.0.0-1.fc29, kernel: 4.19.5-300.fc29.x86_64, hostname: localhost.localdomain
LC_ALL=C \
PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin \
HOME=/home/berrange \
USER=berrange \
LOGNAME=berrange \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-ppc64 \
-name guest=guest,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/home/berrange/.config/libvirt/qemu/lib/domain-33-guest/master-key.aes \
-machine pseries-2.10,accel=tcg,usb=off,dump-guest-core=off \
-m 1024 \
-realtime mlock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid c8a74977-ab18-41d0-ae3b-4041c7fffbcd \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=23,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-device qemu-xhci,id=usb,bus=pci.0,addr=0x1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2018-12-14 12:57:03.730+0000: shutting down, reason=failed
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The virCommand APIs do not expect to be given a NULL value for an arg
name or value. Such a mistake can lead to execution of the wrong
command, as the NULL may prematurely terminate the list of args.
Detect this and report suitable error messages.
This identified a flaw in the storage test which was passing a NULL
instead of the volume path. This flaw was then validated by an incorrect
set of qemu-img args as expected data.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Simplify adding of new errors by just adding them to the array of
messages rather than having to add conversion code.
Additionally most of the messages add the format string part as a suffix
so we can avoid some of the duplication by using a macro which adds the
suffix to the original string. This way most messages fit into the 80
column limit and only 3 exceed 100 colums.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Erik Skultety <eskultet@redhat.com>
Clarify how @info is used and what the returned values look like.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Simplify wording of the error string for VIR_ERR_OPEN_FAILED and
VIR_ERR_CALL_FAILED. The error codes itself are currently unused so it
will not impact any client.
This will simplify upcomming patch which refactors how we convert these.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Few error codes were missing the version of the message with additional
info. In case of the modified messages it's not very likely they'll ever
report any additional data, but for the sake of consistency we should
provide them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Additional information for an error message is either in form of a
string or empty. Fix two offenders. One used %d as the format modifier
and the second one always expected a string.
Thankfully, neither of the offenders are currently in effect.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
We do have one for the error domain but not for the error number itself.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Require that all headers are guarded by a symbol named
LIBVIRT_$FILENAME
where $FILENAME is the uppercased filename, with all characters
outside a-z changed into '_'.
Note we do not use a leading __ because that is technically a
namespace reserved for the toolchain.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This introduces a syntax-check script that validates header files use a
common layout:
/*
...copyright header...
*/
<one blank line>
#ifndef SYMBOL
# define SYMBOL
....content....
#endif /* SYMBOL */
For any file ending priv.h, before the #ifndef, we will require a
guard to prevent bogus imports:
#ifndef SYMBOL_ALLOW
# error ....
#endif /* SYMBOL_ALLOW */
<one blank line>
The many mistakes this script identifies are then fixed.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When libvirt is reconnecting to running domain that uses cgroup v2
the QEMU process reports cgroup for the emulator directory because the
main thread is in that cgroup. We need to remove the "/emulator" part
in order to match with the root cgroup directory name for that domain.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The rewrite to support cgroup v2 missed this function. In cgroup v2
we have different files to track tasks.
We would fail to remove cgroup on non-systemd OSes if there is any
extra process assigned to guest cgroup because we would not kill any
process form the guest cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In many files there are header comments that contain an Author:
statement, supposedly reflecting who originally wrote the code.
In a large collaborative project like libvirt, any non-trivial
file will have been modified by a large number of different
contributors. IOW, the Author: comments are quickly out of date,
omitting people who have made significant contribitions.
In some places Author: lines have been added despite the person
merely being responsible for creating the file by moving existing
code out of another file. IOW, the Author: lines give an incorrect
record of authorship.
With this all in mind, the comments are useless as a means to identify
who to talk to about code in a particular file. Contributors will always
be better off using 'git log' and 'git blame' if they need to find the
author of a particular bit of code.
This commit thus deletes all Author: comments from the source and adds
a rule to prevent them reappearing.
The Copyright headers are similarly misleading and inaccurate, however,
we cannot delete these as they have legal meaning, despite being largely
inaccurate. In addition only the copyright holder is permitted to change
their respective copyright statement.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Most of the iptables APIs share code for the add/delete paths, but a
couple were separated. Merge the remaining APIs to facilitate future
changes.
Reviewed-by: Laine Stump <laine@laine.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The function clears and frees the passed buffers on success, but not in
one case of failure. Modify the control flow that the args are always
consumed, record it in the docs and remove few pointless cleanup paths
in callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Up until now, we formatted 'rendernode=' onto QEMU cmdline only if the
user specified it in the XML, otherwise we let QEMU do it for us. This
causes permission issues because by default the /dev/dri/renderDX
permissions are as follows:
crw-rw----. 1 root video
There's literally no reason why it shouldn't be libvirt picking the DRM
render node instead of QEMU, that way (and because we're using
namespaces by default), we can safely relabel the device within the
namespace.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This is the first step towards libvirt picking the first available
render node instead of QEMU. It also makes sense for us to be able to do
that, since we allow specifying the node directly for SPICE, so if
there's no render node specified by the user, we should pick the first
available one. The algorithm used for that is essentially the same as
the one QEMU uses.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The call of virResctrlMonitorGetStats will allocate the memory for
holding cache occupancy or memory bandwidth statistics.
This patch adds the function virResctrlMonitorFreeStats as the
opposing action of virResctrlMonitorGetStats to free the memory.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Return a list of virResctrlMonitorStatsPtr instead of
a virResctrlMonitorStats array in virResctrlMonitorGetStats.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
There are certain cases e.g. containers where the sysfs path might
exists, but might fail. Unfortunately the exact restrictions are only
known to libvirt when trying to write to it so we need to try it.
But in case it fails there is no need to fully abort, in those cases try
to fall back to the older ioctl interface which can still work.
That makes setting up a bridge in unprivileged LXD containers work.
Fixes: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1802906
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Reported-by: Brian Candler <b.candler@pobox.com>
The path argument of virFileIsDir should be a full name
of file, pathname and filename. Fixed it by passing the
full path name to virFileIsDir.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Since the functions only return 0 or 1, they should return bool. I missed the
change when "refactoring" the first commit.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Both virProcessRunInMountNamespace() and virProcessRunInFork()
look very similar. De-duplicate the code and make
virProcessRunInMountNamespace() call virProcessRunInFork().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This new helper can be used to spawn a child process and run
passed callback from it. This will come handy esp. if the
callback is not thread safe.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We were mistakenly skipping virZPCIDeviceAddressIsEmpty() and
virZPCIDeviceAddressIsValid() when compiling on non-Linux,
which unsurprisingly ended up causing linking failures later
in the build process.
Clue-stick-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
This patch introduces new XML parser/formatter functions. Uid is
16-bit and non-zero. Fid is 32-bit. They are the two attributes of zpci
which is introduced as PCI address element. Zpci element is parsed and
formatted along with PCI address. And add the related test cases.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This patch introduces PCI address extension flag for virDomainDeviceInfo
and virPCIDeviceAddress. The extension flag in virDomainDeviceInfo is
used internally during calculating PCI extension flag. The one in
virPCIDeviceAddress is the duplicate to indicate extension address is
being used. Currently only zPCI extension address is introduced to deal
with 'uid' and 'fid' on the S390 platform.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Add zPCI definitions in preparation of extending the PCI address
with parameters uid (user-defined identifier) and fid (PCI function
identifier).
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Commit 901d2b9c introduced virCgroupGetMemoryStat and replaced
the LXC virLXCCgroupGetMemStat logic in commit e634c7cd0. However,
in doing so the replacement wasn't exact as the LXC logic used
getline() to process the cgroup controller data, while the new
virCgroupGetMemoryStat used "memory.stat" manual buffer read/
processing which neglected to forward through @line in order
to read each line in the output.
To fix that, we should be sure to carry forward the @line value
for each line read updating it beyond that current @newLine value
once we've calculated the values that we want.
Signed-off-by: Peter Chubb <peter.chubb@data61.csiro.au>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This reverts commit ccc72d5cbd.
Based on upstream comment to a follow-up patch, this didn't take the
right approach and the right thing to do is revert and rework.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Missed during review and surprisingly my run through Coverity also
didn't see this. I only noticed it when reading the code while fixing
the build breaker for commit 36780a86a.
With all those continues we would leak @stats.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Add interfaces monitor group to support operations such
as GetID, SetID, Remove, SetAlloc, etc.
Implement the internal virResctrlMonitorGetStats to fetch all
the statistical data and the virResctrlMonitorGetCacheOccupancy
in order to fetch the cache specific "llc_occupancy" value.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Refactor virResctrlAllocSetID generating an error if an attempt
is made to overwrite the existing value.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add interface for creating the resource monitoring group according
to '@virResctrlMonitor->path'.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The code for creating resctrl allocation group could be reused
for monitoring group, refactor it for reuse in the later patch.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The code of adding PID to the allocation could be reused, refactor it
for later reuse.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add interface for resctrl monitor to determine the path.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The code for determining resctrl allocation path could be reused
for monitor. Refactor it for reuse.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Cache Monitoring Technology (aka CMT) provides the capability
to report cache utilization information of system task.
This patch introduces the concept of resctrl monitor through
data structure virResctrlMonitor.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Refactor schemas and virresctrl to support optional <cache> element
in <cachetune>.
Later, the monitor entry will be introduced and to be placed
under <cachetune>. Either cache entry or monitor entry is
an optional element of <cachetune>.
An cachetune has no <cache> element is taking the default resource
allocating policy defined in '/sys/fs/resctrl/schemata'.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When no server name is provided in the URI, modern versions of libxml2
will set the port to '-1'. This is a change from behaviour with earlier
versions which set it to 0.
Libvirt expects the port to be 0 in these cases and as a result we get a
bug when connecting to URIs which lack a server name:
$ virsh -c test+ssh:///default list
error: failed to connect to the hypervisor
error: Cannot recv data: Bad port '-1': Connection reset by peer
This libxml2 change was attempting to fix another bug identified by
libvirt where it didn't roundtrip URIs correctly in:
beb7281055
Essentially libxml2 was not expecting apps to look at the URI port
field when the server name is not provided. This was a reasonable
assumption, but none the less libvirt did look at it :-)
The fix is to ensure we explicitly set port to 0 when server name
is not present, avoiding undefined behaviour for the port field in
libxml2.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1631606
Changes made to manage and utilize a secondary connection
driver to APIs outside the scope of the primary connection
driver have resulted in some confusion processing polkit rules
since the simple "access denied" error message doesn't provide
enough of a clue when combined with the "authentication failed:
access denied by policy" as to which connection driver refused
or failed the ACL check.
In order to provide some context, let's modify the existing
"access denied" error returne from the various vir*EnsureACL
API's to provide the connection driver name that is causing
the failure. This should provide the context for writing the
polkit rules that would allow access via the driver.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Adjusting domain format documentation, adding device address
support and adding command line generation for vfio-ap.
Since only one mediated hostdev with model vfio-ap is supported a check
disallows to define domains with more than one such hostdev device.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Chris Venteicher <cventeic@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1640465
Weirdly enough, there can be symlinks in the path we are trying
to fix. If it is the case our clever algorithm that finds matches
against mount table won't work. Canonicalize path at the
beginning then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The virFileInData() function should return to the caller if the
current position the passed file is in is a data section or a
hole (and also how long the current section is). At any rate,
upon return from this function (be it successful or not) the
original position in the file is restored. This may mess up with
errno which might have been set earlier. Save the errno into a
local variable so it can be restored for the caller's sake.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The URI parser used by libvirt does not populate uri->path if the
trailing slash is missing. The code virStorageSourceParseBackingURI
would then not populate src->path.
As only NBD network disks are allowed to have the 'name' field in the
XML defining the disk source omitted we'd generate an invalid XML which
we'd not parse again.
Fix it by populating src->path with an empty string if the uri is
lacking slash.
As pointed out above NBD is special in this case since we actually allow
it being NULL. The URI path is used as export name. Since an empty
export does not make sense the new approach clears the src->path if the
trailing slash is present but nothing else.
Add test cases now to cover all the various cases for NBD and non-NBD
uris as there was to time only 1 test abusing the quirk witout slash for
NBD and all other URIs contained the slash or in case of NBD also the
export name.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The name is misleading. Change it to 'uristr' so that 'path' can be
reused in the proper context later.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
There are couple of things wrong with the current implementation.
The first one is that in the first loop the code tries to build a
list of fuse.glusterfs mount points. Well, since the strings are
allocated in a temporary buffer and are not duplicated this
results in wrong decision made later in the code.
The second problem is that the code does not take into account
subtree mounts. For instance, if there's a fuse.gluster mounted
at /some/path and another FS mounted at /some/path/subdir the
code would not recognize this subdir mount.
Reported-by: Han Han <hhan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
If the given path is already a mount point (e.g. a bind mount of
a file, or simply a direct mount point of a FS), then our code
fails to detect that because the first thing it does is cutting
off part after last slash '/'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
On s390x the struct member f_type of statsfs is hard coded to 'unsigned
int'. Change virFileIsSharedFixFUSE() to take a 'long long int' and use
a temporary to avoid pointer-casting.
This fixes the following error:
../../src/util/virfile.c:3578:38: error: cast increases required alignment of target type [-Werror=cast-align]
virFileIsSharedFixFUSE(path, (long *) &sb.f_type);
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Bjoern Walk <bwalk@linux.ibm.com>
virFileReadValueUint does not log errors for non-existient files,
it merely returns -2.
Commit 12093f1 introduced this.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
This enables to use both cgroup v1 and v2 at the same time together
with libvirt. It is supported by kernel and there is valid use-case,
not all controllers are implemented in cgroup v2 so there might be
configurations where administrator would enable these missing
controllers in cgroup v1.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In order to set CPU cfs period using cgroup v2 'cpu.max' interface
we need to load the current value of CPU cfs quota first because
format of 'cpu.max' interface is '$quota $period' and in order to
change 'period' we need to write 'quota' as well. Writing only one
number changes only 'quota'.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In cgroups v2 we need to handle threads and processes differently.
If you need to move a process you need to write its pid into
cgrou.procs file and it will move the process with all its threads
as well. The whole process will be moved if you use tid of any thread.
In order to move only threads at first we need to create threaded group
and after that we can write the relevant thread tids into cgroup.threads
file. Threads can be moved only into cgroups that are children of
cgroup of its process.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When creating cgroup hierarchy we need to enable controllers in the
parent cgroup in order to be usable. That means writing "+{controller}"
into cgroup.subtree_control file. We can enable only controllers that
are enabled for parent cgroup, that means we need to do that for the
whole cgroup tree.
Cgroups for threads needs to be handled differently in cgroup v2. There
are two types of controllers:
- domain controllers: these cannot be enabled for threads
- threaded controllers: these can be enabled for threads
In addition there are multiple types of cgroups:
- domain: normal cgroup
- domain threaded: a domain cgroup that serves as root for threaded
cgroups
- domain invalid: invalid cgroup, can be changed into threaded, this
is the default state if you create subgroup inside
domain threaded group or threaded group
- threaded: threaded cgroup which can have domain threaded or
threaded as parent group
In order to create threaded cgroup it's sufficient to write "threaded"
into cgroup.type file, it will automatically make parent cgroup
"domain threaded" if it was only "domain". In case the parent cgroup
is already "domain threaded" or "threaded" it will modify only the type
of current cgroup. After that we can enable threaded controllers.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Cgroup v2 has only single mount point for all controllers. The list
of controllers is stored in cgroup.controllers file, name of controllers
are separated by space.
In cgroup v2 there is no cpuacct controller, the cpu.stat file always
exists with usage stats.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If the placement was copied from parent or set to absolute path
there is nothing to do, otherwise set the placement based on
process placement from /proc/self/cgroup or /proc/{pid}/cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When reconnecting to a domain we are validating the cgroup name.
In case of cgroup v2 we need to validate only the new format for host
without systemd '{machinename}.libvirt-{drivername}' or scope name
generated by systemd.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We cannot detect only mount points to figure out whether cgroup v2
is available because systemd uses cgroup v2 for process tracking and
all controllers are mounted as cgroup v1 controllers.
To make sure that this is no the situation we need to check
'cgroup.controllers' file if it's not empty to make sure that cgroup
v2 is not mounted only for process tracking.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Place cgroup v2 backend type before cgroup v1 to make it obvious
that cgroup v2 is preferred implementation.
Following patches will introduce support for hybrid configuration
which will allow us to use both at the same time, but we should
prefer cgroup v2 regardless.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1632711
GlusterFS is typically safe when it comes to migration. It's a
network FS after all. However, it can be mounted via FUSE driver
they provide. If that is the case we fail to identify it and
think migration is not safe and require VIR_MIGRATE_UNSAFE flag.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Commit 87a8a30d6 added the function based on the virsh function,
but used an unsigned long long instead of a double and thus that
limits the maximum result.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
This reverts commit 1602aa28f8.
There is no need to call virCgroupRemove() nor virCgroupFree() if
virCgroupEnableMissingControllers() fails because it will not modify
'group' at all.
The cleanup of directories is done in virCgroupMakeGroup().
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Cgroups are linux specific and we need to make sure that the code is
compiled only on linux. On different OSes it fails the compilation:
../../src/util/vircgroupv1.c:65:19: error: variable has incomplete type 'struct mntent'
struct mntent entry;
^
../../src/util/vircgroupv1.c:65:12: note: forward declaration of 'struct mntent'
struct mntent entry;
^
../../src/util/vircgroupv1.c:74:12: error: implicit declaration of function 'getmntent_r' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
while (getmntent_r(mounts, &entry, buf, sizeof(buf)) != NULL) {
^
../../src/util/vircgroupv1.c:814:39: error: use of undeclared identifier 'MS_NOSUID'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:814:49: error: use of undeclared identifier 'MS_NODEV'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:814:58: error: use of undeclared identifier 'MS_NOEXEC'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:841:65: error: use of undeclared identifier 'MS_BIND'
if (mount(src, group->legacy[i].mountPoint, "none", MS_BIND,
^
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
All the system headers are used only if we are compiling on linux
and they all are present otherwise we would have seen build errors
because in our tests/vircgrouptest.c we use only __linux__ to check
whether to skip the cgroup tests or not.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
tests/vircgrouptest.c uses #ifdef __linux__ for a long time and no
failure was reported so far so it's safe to assume that __linux__ is
good enough to guard cgroup code.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Use the mnemonic macros of libdbus for 1 (TRUE) and 0 (FALSE).
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Report a debug message if dbus_watch_handle() returns FALSE.
dbus_watch_handle() returns FALSE if there wasn't enough memory for
reading or writing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
As documented at
https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga2522ac5075dfe0a1535471f6e045e1ee
the creator of a non-shared D-Bus connection has to release the last
reference after closing for freeing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Grab a ref for info->bus (a DBus connection) as long as the while loop
is running. With the grabbed reference it is ensured that info->bus
isn't freed as long as the while loop is executed. This is necessary
as it's allowed to drop the last ref for the bus connection in a
handler.
There was already a bug of this kind in libdbus itself:
https://bugs.freedesktop.org/show_bug.cgi?id=15635.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
With the introduction of cgroup v2 there are new names used with
cgroups based on which version is used:
- legacy: cgroup v1
- unified: cgroup v2
- hybrid: cgroup v1 and cgroup v2
Let's use 'legacy' instead of 'cgroupv1' or 'controllers' in our code.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
They all need virCgroupV1GetMemoryUnlimitedKB() so it's easier to
move them in one commit.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>