When virAsprintf was changed from a function to a macro
reporting OOM error in dc6f2da, it was documented as returning
0 on success. This is incorrect, it returns the number of bytes
written as asprintf does.
Some of the functions were converted to use virAsprintf's return
value directly, changing the return value on success from 0 to >= 0.
For most of these, this is not a problem, but the change in
virPCIDriverDir breaks PCI passthrough.
The return value check in virhashtest pre-dates virAsprintf OOM
conversion.
vmwareMakePath seems to be unused.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reuse the buffer for getline and track buffer allocation
separately from the string length to prevent unlikely
out-of-bounds memory access.
This fixes the following leak that happened when zero bytes were read:
==404== 120 bytes in 1 blocks are definitely lost in loss record 1,344 of 1,671
==404== at 0x4C2C71B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==404== by 0x906F862: getdelim (iogetdelim.c:68)
==404== by 0x52A48FB: virCgroupPartitionNeedsEscaping (vircgroup.c:1136)
==404== by 0x52A0FB4: virCgroupPartitionEscape (vircgroup.c:1171)
==404== by 0x52A0EA4: virCgroupNewDomainPartition (vircgroup.c:1450)
I recently patches the callers to virPCIDeviceReset() to not call it
if the current driver for a device was vfio-pci (since that driver
will always reset the device itself when appropriate. At the time, Dan
Berrange suggested that I could instead modify virPCIDeviceReset
to check the currently bound driver for the device, and decide
for itself whether or not to go ahead with the reset.
This patch removes the previously added checks, and replaces them with
a check down in virPCIDeviceReset(), as suggested.
The functional difference here is that previously we were deciding
based on either the hostdev configuration or the value of
stubDriverName in the virPCIDevice object, but now we are actually
comparing to the "driver" link in the device's sysfs entry
directly. In practice, both should be the same.
virPCIDeviceGetDriverPathAndName is a static function that will need
to be called by another function that occurs above it in the
file. This patch reorders the static functions so that a forward
declaration isn't needed.
Previously a connection object was required to retrieve the auth
credentials. This patch adds the option to call the retrieval functions
only using the connection URI or path to the configuration file. This
will allow to use this toolkit to request passwords for ssh
authentication in the libssh2 connection driver.
Changes:
*virAuthGetConfigFilePathURI(): use URI to retrieve the config file path
*virAuthGetCredential(): Remove the need to propagate conn object
virAuthGetPasswordPath():
*virAuthGetUsernamePath(): New functions, that use config file path
instead of conn object
https://bugzilla.redhat.com/show_bug.cgi?id=964358
POSIX states that multi-threaded apps should not use functions
that are not async-signal-safe between fork and exec, yet we
were using getpwuid_r and initgroups. Although rare, it is
possible to hit deadlock in the child, when it tries to grab
a mutex that was already held by another thread in the parent.
I actually hit this deadlock when testing multiple domains
being started in parallel with a command hook, with the following
backtrace in the child:
Thread 1 (Thread 0x7fd56bbf2700 (LWP 3212)):
#0 __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007fd5761e7388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007fd5761e7257 in __pthread_mutex_lock (mutex=0x7fd56be00360)
at pthread_mutex_lock.c:61
#3 0x00007fd56bbf9fc5 in _nss_files_getpwuid_r (uid=0, result=0x7fd56bbf0c70,
buffer=0x7fd55c2a65f0 "", buflen=1024, errnop=0x7fd56bbf25b8)
at nss_files/files-pwd.c:40
#4 0x00007fd575aeff1d in __getpwuid_r (uid=0, resbuf=0x7fd56bbf0c70,
buffer=0x7fd55c2a65f0 "", buflen=1024, result=0x7fd56bbf0cb0)
at ../nss/getXXbyYY_r.c:253
#5 0x00007fd578aebafc in virSetUIDGID (uid=0, gid=0) at util/virutil.c:1031
#6 0x00007fd578aebf43 in virSetUIDGIDWithCaps (uid=0, gid=0, capBits=0,
clearExistingCaps=true) at util/virutil.c:1388
#7 0x00007fd578a9a20b in virExec (cmd=0x7fd55c231f10) at util/vircommand.c:654
#8 0x00007fd578a9dfa2 in virCommandRunAsync (cmd=0x7fd55c231f10, pid=0x0)
at util/vircommand.c:2247
#9 0x00007fd578a9d74e in virCommandRun (cmd=0x7fd55c231f10, exitstatus=0x0)
at util/vircommand.c:2100
#10 0x00007fd56326fde5 in qemuProcessStart (conn=0x7fd53c000df0,
driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, migrateFrom=0x0, stdin_fd=-1,
stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
flags=1) at qemu/qemu_process.c:3694
...
The solution is to split the work of getpwuid_r/initgroups into the
unsafe portions (getgrouplist, called pre-fork) and safe portions
(setgroups, called post-fork).
* src/util/virutil.h (virSetUIDGID, virSetUIDGIDWithCaps): Adjust
signature.
* src/util/virutil.c (virSetUIDGID): Add parameters.
(virSetUIDGIDWithCaps): Adjust clients.
* src/util/vircommand.c (virExec): Likewise.
* src/util/virfile.c (virFileAccessibleAs, virFileOpenForked)
(virDirCreate): Likewise.
* src/security/security_dac.c (virSecurityDACSetProcessLabel):
Likewise.
* src/lxc/lxc_container.c (lxcContainerSetID): Likewise.
* configure.ac (AC_CHECK_FUNCS_ONCE): Check for setgroups, not
initgroups.
Signed-off-by: Eric Blake <eblake@redhat.com>
Since neither getpwuid_r() nor initgroups() are safe to call in
between fork and exec (they obtain a mutex, but if some other
thread in the parent also held the mutex at the time of the fork,
the child will deadlock), we have to split out the functionality
that is unsafe. At least glibc's initgroups() uses getgrouplist
under the hood, so the ideal split is to expose getgrouplist for
use before a fork. Gnulib already gives us a nice wrapper via
mgetgroups; we wrap it once more to look up by uid instead of name.
* bootstrap.conf (gnulib_modules): Add mgetgroups.
* src/util/virutil.h (virGetGroupList): New declaration.
* src/util/virutil.c (virGetGroupList): New function.
* src/libvirt_private.syms (virutil.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch needs to look up pw_gid; but it is wasteful
to crawl through getpwuid_r twice for two separate pieces
of information, and annoying to copy that much boilerplate
code for doing the crawl. The current internal-only
virGetUserEnt is also a rather awkward interface; it's easier
to just design it to let callers request multiple pieces of
data as needed from one traversal.
And while at it, I noticed that virGetXDGDirectory could deref
NULL if the getpwuid_r lookup fails.
* src/util/virutil.c (virGetUserEnt): Alter signature.
(virGetUserDirectory, virGetXDGDirectory, virGetUserName): Adjust
callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
Recent changes uncovered a NEGATIVE_RETURNS in the return from sysconf()
when processing a for loop in virtTestCaptureProgramExecChild() in
testutils.c
Code review uncovered 3 other code paths with the same condition that
weren't found by Covirity, so fixed those as well.
I had made the change locally, so make check and make syntax-check
were successful, but forgot to add/commit. Unfortunately, git allows a
push when the local directory is dirty, so it didn't catch my mistake.
Eliminate memmove() by using VIR_*_ELEMENT API instead.
In both pci and usb cases, the count that held the size of the list
was unsigned int so it had to be changed to size_t.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Actually, I'm turning this function into a macro as filename,
function name and line number needs to be passed. The new
function virAsprintfInternal is introduced with the extended set
of arguments.
The device bus value was used instead of the device target when
building the sysfs device path. Trivial.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Whenever virPortAllocatorRelease is called with port == 0, it complains
that the port is not in an allowed range, which is expectable as the
port was never allocated. Let's make virPortAllocatorRelease ignore 0
ports in a similar way free() ignores NULL pointers.
When removing a TAP device, the associated bandwidth settings are
removed. Currently, the /sbin/tc is used for that. It is spawned
several times. Moreover, we use the same @cmd variable to
construct the command and its arguments. That means we need to
virCommandFree(cmd); prior to each virCommandNew(TC); which
wasn't done.
On Fedora 18, when cross-compiling to mingw with the mingw*-dbus
packages installed, compilation fails with:
CC libvirt_net_rpc_server_la-virnetserver.lo
In file included from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-connection.h:32:0,
from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-bus.h:30,
from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus.h:31,
from ../../src/util/virdbus.h:26,
from ../../src/rpc/virnetserver.c:39:
/usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-message.h:74:58: error: expected ';', ',' or ')' before 'struct'
I have reported this as a bug against two packages:
- mingw-headers, for polluting the namespace
https://bugzilla.redhat.com/show_bug.cgi?id=980270
- dbus, for not dealing with the pollution
https://bugzilla.redhat.com/show_bug.cgi?id=980278
At least dbus has agreed that a future version of dbus headers will
do s/interface/iface/, regardless of what happens in mingw. But it
is also easy to workaround in libvirt in the meantime, without having
to wait for either mingw or dbus to upgrade.
* src/util/virdbus.h (includes): Undo mingw's pollution so that
dbus doesn't fail.
Signed-off-by: Eric Blake <eblake@redhat.com>
iptablesContext holds only 4 pairs of iptables
(table, chain) and there's no need to pass
it around.
This is a first step towards separating bridge_driver.c
in platform-specific parts.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=971325
The problem was that if virPCIGetVirtualFunctions was given the name
of a non-existent interface, it would return to its caller without
initializing the pointer to the array of virtual functions to NULL,
and the caller (virNetDevGetVirtualFunctions) would try to VIR_FREE()
the invalid pointer.
The final error message before the crash would be:
virPCIGetVirtualFunctions:2088 :
Failed to open dir '/sys/class/net/eth2/device':
No such file or directory
In this patch I move the initialization in virPCIGetVirtualFunctions()
to the begining of the function, and also do an explicit
initialization in virNetDevGetVirtualFunctions, just in case someone
in the future adds code into that function prior to the call to
virPCIGetVirtualFunctions.
The IF_MAXUNIT macro is not present on all BSDs, so
make its use conditional, to avoid breaking OS-X.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The 'in_addr_t' typedef is not present in Mingw64 headers.
Instead we can use the more portable 'struct in_addr' and
then access its 's_addr' field.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When creating a virtual FC HBA with virsh/libvirt API, an error message
will be returned: "error: Node device not found",
also the 'nodedev-dumpxml' shows wrong information of wwpn & wwnn
for the new created device.
Signed-off-by: xschen@tnsoft.com.cn
This reverts f90af69 which switched wwpn & wwwn in the wrong place.
https://www.kernel.org/doc/Documentation/scsi/scsi_fc_transport.txt
Building on FreeBSD had this linker error:
/work/a/ports/devel/libvirt/work/libvirt-1.1.0/src/.libs/libvirt.so:
undefined reference to `virPCIDeviceAddressParse'
This was caused by the new use of virPCIDeviceAddressParse in a
portion of virpci.c that wasn't linux-only (in commit 72c029d8). The
problem was that virPCIDeviceAddressParse had originally been defined
inside #ifdef _linux (because it was only used by another function
that was inside the same ifdef).
The solution is to move it out to the part of virpci.c that is
compiled on all platforms.
(Because the portion that was "moved" was 40-50 lines, but only moved
up by 15 lines, the diff for the patch is less than non-informative -
rather than showing that part that I moved, it shows the bit that was
previously before the moved part, and now sits *after* it.)
Any device which belongs to an "IOMMU group" (used by vfio) will
have links to all devices of its group listed in
/sys/bus/pci/$device/iommu_group/devices;
/sys/bus/pci/$device/iommu_group is actually a link to
/sys/kernel/iommu_groups/$n, where $n is the group number (there
will be a corresponding device node at /dev/vfio/$n once the
devices are bound to the vfio-pci driver)
The following functions are added:
virPCIDeviceGetIOMMUGroupList
Gets a virPCIDeviceList with one virPCIDeviceList for each device
in the same IOMMU group as the provided virPCIDevice (a copy of the
original device object is included in the list.
virPCIDeviceAddressIOMMUGroupIterate
Calls the function @actor once for each device in the group that
contains the given virPCIDeviceAddress.
virPCIDeviceAddressGetIOMMUGroupAddresses
Fills in a virPCIDeviceAddressPtr * with an array of
virPCIDeviceAddress, one for each device in the iommu group of the
provided virPCIDeviceAddress (including a copy of the original).
virPCIDeviceAddressGetIOMMUGroupNum
Returns the group number as an int (a valid group number will always
be 0 or greater). If there is no iommu_group link in the device's
directory (usually indicating that vfio isn't loaded), -2 will be
returned. On any real error, -1 will be returned.
We only break out of the while loop if *content is an empty string.
However the buffer has been allocated to BUFSIZ + 1 (8193 in my case),
but it gets overwritten in the next for iteration.
Move VIR_FREE right before we overwrite it to avoid the leak.
==5777== 16,386 bytes in 2 blocks are definitely lost in loss record 1,022 of 1,027
==5777== by 0x5296E28: virReallocN (viralloc.c:184)
==5777== by 0x52B0C66: virFileReadLimFD (virfile.c:1137)
==5777== by 0x52B0E1A: virFileReadAll (virfile.c:1199)
==5777== by 0x529B092: virCgroupGetValueStr (vircgroup.c:534)
==5777== by 0x529AF64: virCgroupMoveTask (vircgroup.c:1079)
Introduced by 83e4c77.
https://bugzilla.redhat.com/show_bug.cgi?id=978352
Don't check for '\n' at the end of file if zero bytes were read.
Found by valgrind:
==404== Invalid read of size 1
==404== at 0x529B09F: virCgroupGetValueStr (vircgroup.c:540)
==404== by 0x529AF64: virCgroupMoveTask (vircgroup.c:1079)
==404== by 0x1EB475: qemuSetupCgroupForEmulator (qemu_cgroup.c:1061)
==404== by 0x1D9489: qemuProcessStart (qemu_process.c:3801)
==404== by 0x18557E: qemuDomainObjStart (qemu_driver.c:5787)
==404== by 0x190FA4: qemuDomainCreateWithFlags (qemu_driver.c:5839)
Introduced by 0d0b409.
https://bugzilla.redhat.com/show_bug.cgi?id=978356
The "fix" I pushed a few commits ago would still leak a virPCIDevice
in case of an OOM error. Although it's inconsequential in practice,
this patch satisfies my OCD.
The same strings were being re-created multiple times just to save
declaring a new variable. In the meantime, the use of the generic
variable names led to confusion when trying to follow the code. This
patch creates strings for:
stubDriverName (was called "driver" in original args)
stubDriverPath ("/sys/bus/pci/drivers/${stubDriverName}")
driverLink ("${device}/driver")
oldDriverName (the final component of path linked to by
"${device}/driver")
oldDriverPath ("/sys/bus/pci/drivers/${oldDriverName}")
then re-uses them as necessary.
I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).
I'm changing it now because I'm about to add several more functions
that deal with iommu groups.
The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it.
Commit 861d40565 added code (my personal change to "clean up" the
submitter's code, *not* the fault of the submitter) that dereferenced
virtVlan without first checking for NULL. This patch fixes that and,
as part of the fix, cleans up some unnecessary obtuseness.
virNetDevBridgeSetSTPDelay accepts delay in milliseconds,
but BSD implementation was expecting seconds. Therefore,
it was working correctly only with delay == 0.
This patch adds functionality to allow libvirt to configure the
'native-tagged' and 'native-untagged' modes on openvswitch networks.
Signed-off-by: Laine Stump <laine@redhat.com>
All APIs that take typed parameters are only using params address in
their entry point debug messages. With the new VIR_TYPED_PARAMS_DEBUG
macro, all functions can easily log all individual typed parameters
passed to them.
When unsupported parameter is passed to virTypedParamsValidate,
VIR_ERR_ARGUMENT_UNSUPPORTED should be returned rather than
VIR_ERR_INVALID_ARG, which is more appropriate for supported parameters
used incorrectly.
virPCIDeviceDetach would previously sometimes consume the input device
object (to put it on the inactive list) and sometimes not. Avoiding
memory leaks required checking beforehand to see if the device was
already on the list, and freeing the device object in the caller only
if there wasn't already an identical object on the inactive list.
This patch makes it consistent - virPCIDeviceDetach will *never*
consume the input virPCIDevice object; if it needs to put one on the
inactive list, it will create a copy and put *that* on the list. This
way the caller knows that it is always their responsibility to free
the device object they created.
virPCIDeviceReattach was making the assumption that the dev object
given to it was one and the same with the dev object on the
inactiveDevs list. If that had been the case, it would not need to
free the dev object it removed from the inactive list, because the
caller of virPCIDeviceReattach always frees the dev object that it
passes in. Since the dev object passed in is *never* the same object
that's on the list (it is a different object with the same name and
attributes, created just for the purpose of searching for the actual
object), simply doing a "ListSteal" to remove the object from the list
results in one leaked object; we need to actually free the object
after removing it from the list.
* virPCIDeviceFindByIDs - find a device on a list w/o creating an object
This makes searching for an existing device on a list lighter weight.
* virPCIDeviceCopy - make a copy of an existing virPCIDevice object.
* virPCIDeviceGetDriverPathAndName - construct new strings containing
1) the name of the driver bound to this device.
2) the full path to the sysfs config for that driver.
(This code was lifted from virPCIDeviceUnbindFromStub, and replaced
there with a call to this new function).
Previously stubDriver was always set from a string literal, so it was
okay to use a const char * that wasn't freed when the virPCIDevice was
freed. This will not be the case in the near future, so it is now a
char* that is allocated in virPCIDeviceSetStubDriver() and freed
during virPCIDeviceFree().
This patch introduces the virAccessManagerPtr class as the
interface between virtualization drivers and the access
control drivers. The viraccessperm.h file defines the
various permissions that will be used for each type of object
libvirt manages
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It's not used anywhere except for the switch in
virStorageBackendCreateQemuImgOpts, where leaving it in causes
a dead code coverity warning and omitting it breaks compilation
because of unhandled enum value.
Introduced by 6298f74.
Detect qcow2 images with version 3 in the image header as
VIR_STORAGE_FILE_QCOW2.
These images have a feature bitfield, with just one feature supported
so far: lazy_refcounts.
The header length changed too, moving the location of the backing
format name.
Call virLogVMessage instead of virLogMessage, since libudev
called us with a va_list object, not a list of arguments.
Honor message priority and strip the trailing newline.
https://bugzilla.redhat.com/show_bug.cgi?id=969152
virProcessGetNamespaces() opens files in /proc/XXX/ns/ which will
later be passed to setns(). We have to make sure that the file
descriptors in the array are in the correct order. In particular
the 'user' namespace must be first otherwise setns() may fail
for other namespaces.
The order has been taken from util-linux's sys-utils/nsenter.c
Also we must ignore EINVAL in setns() which occurs if the
namespace associated with the fd, matches the calling process'
current namespace.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Such as on FreeBSD. Broken in commit aa2a4cff7.
* src/util/virstoragefile.c (virStorageFileResize): Add missing ';',
mark conditionally unused variables.
Signed-off-by: Eric Blake <eblake@redhat.com>
The document for "vol-resize" says the new capacity will be sparse
unless "--allocate" is specified, however, the "--allocate" flag
is never implemented. This implements the "--allocate" flag for
fs backend's raw type volume, based on posix_fallocate and the
syscall SYS_fallocate.
We can't use GNULIB's fprintf-posix due to licensing
incompatibilities. We do already have a portable
formatting via virAsprintf() which we got from GNULIB
though. We can use to create a virFilePrintf() function.
But really gnulib could just provide a 'fprintf'
module, that depended on just its 'asprintf' module.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
I noticed several unusual spacings in for loops, and decided to
fix them up. See the next commit for the syntax check that found
all of these.
* examples/domsuspend/suspend.c (main): Fix spacing.
* python/libvirt-override.c: Likewise.
* src/conf/interface_conf.c: Likewise.
* src/security/virt-aa-helper.c: Likewise.
* src/util/virconf.c: Likewise.
* src/util/virhook.c: Likewise.
* src/util/virlog.c: Likewise.
* src/util/virsocketaddr.c: Likewise.
* src/util/virsysinfo.c: Likewise.
* src/util/viruuid.c: Likewise.
* src/vbox/vbox_tmpl.c: Likewise.
* src/xen/xen_hypervisor.c: Likewise.
* tools/virsh-domain-monitor.c (vshDomainStateToString): Drop
default case, to let compiler check us.
* tools/virsh-domain.c (vshDomainVcpuStateToString): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
When src is NULL, VIR_STRDUP will return 0 directly.
This patch will set dest to NULL before VIR_STRDUP return.
Example:
[root@yds-pc libvirt]# virsh
Welcome to virsh, the virtualization interactive terminal.
Type: 'help' for help with commands
'quit' to quit
virsh # connect
error: Failed to connect to the hypervisor
error: internal error Unable to parse URI �N�*
Signed-off-by: yangdongsheng <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
With previous patch, we accept negative value as length of string to
duplicate. So there is no need to pass strlen(src) in case we want to do
duplicate the whole string.
It may shorten the code a bit as the following pattern:
VIR_STRNDUP(dst, src, cond ? n : strlen(src))
is used on several places among our code. However, we can
move the strlen into virStrndup and thus write just:
VIR_STRNDUP(dst, src, cond ? n : -1)
Currently, the controllers argument to virCgroupDetect acts both as
a result filter and a required controller specification, which is
a bit overloaded. If both functionalities are needed, it would be
better to have them seperated into a filter and a requirement mask.
The only situation where it is used today is to ensure that only
CPU related controllers are used for the VCPU directories. But here
we clearly do not want to enforce the existence of cpu, cpuacct and
specifically not cpuset at the same time.
This commit changes the semantics of controllers to "filter only".
Should a required mask ever be needed, more work will have to be done.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Within whole vircgroup.c we 'return -errno', e.g. 'return -ENOMEM'.
However, in this specific function virCgroupAddTaskStrController
we weren't returning -ENOMEM but -1 despite fact that later in
the function we are returning one of errno values indeed.
In my previous patches I enabled the IFF_MULTI_QUEUE flag every
time the user requested multiqueue TAP device. However, this
works only at runtime. During build time the flag may be
undeclared.
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
https://bugzilla.redhat.com/show_bug.cgi?id=965169 documents a
problem starting domains when cgroups are enabled; I was able
to reliably reproduce the race about 5% of the time when I added
hooks to domain startup by 3 seconds (as that seemed to be about
the length of time that qemu created and then closed a temporary
thread, probably related to aio handling of initially opening
a disk image). The problem has existed since we introduced
virCgroupMoveTask in commit 9102829 (v0.10.0).
There are some inherent TOCTTOU races when moving tasks between
kernel cgroups, precisely because threads can be created or
completed in the window between when we read a thread id from the
source and when we write to the destination. As the goal of
virCgroupMoveTask is merely to move ALL tasks into the new
cgroup, it is sufficient to iterate until no more threads are
being created in the old group, and ignoring any threads that
die before we can move them.
It would be nicer to start the threads in the right cgroup to
begin with, but by default, all child threads are created in
the same cgroup as their parent, and we don't want vcpu child
threads in the emulator cgroup, so I don't see any good way
of avoiding the move. It would also be nice if the kernel were
to implement something like rename() as a way to atomically move
a group of threads from one cgroup to another, instead of forcing
a window where we have to read and parse the source, then format
and write back into the destination.
* src/util/vircgroup.c (virCgroupAddTaskStrController): Ignore
ESRCH, because a thread ended between read and write attempts.
(virCgroupMoveTask): Loop until all threads have moved.
Signed-off-by: Eric Blake <eblake@redhat.com>
Resolves:https://bugzilla.redhat.com/show_bug.cgi?id=927620
#kill -STOP `pidof qemu-kvm`
#virsh destroy $guest --graceful
error: Failed to destroy domain testVM
error: An error occurred, but the cause is unknown
With --graceful, SIGTERM always is emitted to kill driver
process, but it won't success till burning out waiting time
in case of process being stopped.
But domain destroy without --graceful can work, SIGKILL will
be emitted to the stopped process after 10 secs which always
kills a process even one that is currently stopped.
So report an error after burning out waiting time in this case.
Change bbe97ae968 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In an upcoming patch, I need the way to safely transfer a nested
virJSON object out of its parent container for independent use,
even after the parent is freed.
* src/util/virjson.h (virJSONValueObjectRemoveKey): New function.
(_virJSONObject, _virJSONArray): Use correct type.
* src/util/virjson.c (virJSONValueObjectRemoveKey): Implement it.
* src/libvirt_private.syms (virjson.h): Export it.
* tests/jsontest.c (mymain): Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>